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Abstract — To reduce the implementation complexity of a be- 
lief propagation (BP) based low-density parity-check (LDPC) 
decoder, shuffled BP decoding schedules, which serialize the 
decoding process by dividing a complete parallel message-passing 
iteration into a sequence of sub-iterations, have been proposed. 
The so-called group horizontal shuffled BP algorithm partitions 
the check nodes of the code graph into groups to perform 
group-by-group message-passing decoding. This paper proposes 
a new grouping technique to accelerate the message-passing rate. 
Performance of the proposed algorithm is analyzed by a Gaussian 
approximation approach. Both analysis and numerical experi- 
ments verify that the new algorithm does yield a convergence 
rate faster than that of existing conventional or group shuffled 
BP decoder with the same computing complexity constraint. 

I. Introduction 

Low-density parity-check (LDPC) codes with belief propa- 
gation (BP) or so-called sum-product algorithm (SPA) based 
decoder can offer near-capacity performance. The SPA de- 
coder, however, suffers from low convergence rate and high 
implementation complexity. To improve the rate of conver- 
gence and reduce implementation cost, serialized BP decoding 
algorithms which partition either the variable nodes (VNs) Q 
or the check nodes (CNs) of the corresponding bipartite 
graph into multiple groups were introduced. These two classes 
of serial SPA algorithms are called vertical and horizontal 
group shuffled BP decoding algorithms, respectively. More 
recent related works can be found in -(51. These prac- 
tical alternatives use serial-parallel decoding schedules that 
perform sequential group-wise message-passings and have the 
advantage of obtaining more reliable extrinsic messages for 
subsequent decoding within an iteration. 

In this paper, we focus on the horizontal group shuffled 
BP decoding algorithms as they provide more advantages in 
hardware implementation (TJ 0. For the sake of brevity, 
group shuffled BP (GSBP) stands for horizontal group shuffled 
BP (HGSBP) throughout this paper. For conventional GSBP 
schedules, the CNs are divided into a number of groups such 
that each CN belongs to just one group. A decoding iteration 
consists of several sub-iterations. Each sub-iteration updates 
in parallel the log-likelihood ratios (LLR) associated with the 
VNs connecting to the CNs in the same group. Hence within 
a sub-iteration, message-passing is performed on the bipartite 
subgraph that consists of the CNs of a group and all the VNs 
connecting to these CNs. Unlike conventional group shuffled 



(GS) schedules which partition either VNs or CNs into disjoint 
groups, we propose a GS decoding schedule which divides 
CNs into non-disjoint CN groups. Such a CN grouping results 
in larger connectivity of consecutive subgraphs (CoCSG) as- 
sociated with two neighboring CN groups, where the CoCSG, 
denoted by £, refers to the the average number of VNs 
connecting the CNs of, say, the fcth group and the VNs which 
are also linked to the CNs of the previous, i.e., (k — l)th, 
CN group. A larger CoCSG means more information will be 
forwarded from the previous sub-iteration and thus provides 
opportunities for improved decoding performance. We demon- 
strate by using both simulation and analysis that the proposed 
GSBP is indeed capable of offering significant performance 
gain and additional performance-complexity-decoding delay 
tradeoffs. Since our division on the CNs yields CN groups 
with a nonempty intersection for any two neighboring groups, 
we refer to the resulting decoding schedule as non-disjoint 
group- shuffled belief propagation (NDGSBP) in subsequent 
discourse. 

To analyze the performance of iterative LDPC decoding 
algorithms in binary-input additive white Gaussian noise (BI- 
AWGN) channels, approaches such as density evolution (DE), 
Gaussian approximation (GA), and extrinsic information trans- 
fer (EXIT) charts have been proposed ITll- fTTIl . We adopt the 
GA approach ifTTIl as it requires just the tracking of the first 
two moments which are sufficient to completely characterize 
the probability densities. Moreover, if a consistency condition 
is met (Til , we need to track only the means of related 
likelihood parameters. 

The rest of this paper is organized as follows. In Section [III 
we explain the basic idea of the new grouping method, provide 
relevant parameter definitions and present the NDGSBP de- 
coding algorithm. The corresponding GA-based performance 
analysis is given in Section [Till Section [TV] provides numerical 
performance examples of the our algorithm, estimated by 
both computer simulations and analysis. Finally, concluding 
remarks are drawn in Section [Vj 

II. Non-Disjoint Group Shuffled Belief 
Propagation Algorithm 

A. Why GS decoding with non-disjoint groups? 

Consider the decoding sub-iteration which performs VN-to- 
CN and then CN-to-VN message passing for the CNs of the 
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Fig. 1. The Tanner Graph of A Linear Block Code. 



kth group and all connecting VNs. If (at least) one of the VNs 
is linked to some CNs in other (CN) groups which have been 
processed in the same decoding iteration before (i.e., whose 
group indices are smaller than k), then other connecting VNs 
which have no such links will benefit from receiving more 
newly updated messages. We use a simple linear code and 
its associating Tanner graph shown in Fig. Q] where there are 
four CNs {ci, C2, C3, 04} and eight VNs {vi, V2, • • • , v?, vg}, 
to explain this effect. Let the messages the VNs carry be de- 
noted by mi, m2, ... , ™>7, mg. In a conventional BP decoding 
iteration, each VN receives the messages from its neighboring 
VNs which are linked through some VNs. For instance, 
and vq are updated by the messages {777,2,7715,777,6,777,7} and 
{777,4, m^}, respectively. For the GSBP decoding with two CN 
groups {01,02} and {03,04}, v<± receives {777,2,777,5} in the 
first sub-iteration and {777,2,777,5,777,6,777,7} in the second sub- 
iteration while vq is updated by {777,2,777,4,777,5,777,7} in which 
777,2 and 777,5 are the messages forwarded by V4 because of its 
connection to the second CN group and will help improving 
the convergence. Obviously, the amount of messages the CNs 
in the k group receive from VNs connected to CNs belonging 
to the jth group, j < k depends on the code structure and the 
grouping of CNs. If we limit our attention to the case j = k— 1, 
the single parameter t defined in the introductory section can 
be used to quantify the average amount of messages received 
from the previous sub-iteration and the grouping should try to 
maximize this number. 

To simplify our systematic non-disjoint grouping method, 
we assume identical group cardinality, No, and denote the 
number of CN groups by G so that G x Nq = M is the 
number of CNs. We define the overlapping ratio r as the ratio 
between the size of the intersection between two neighboring 
CN groups and G. Then, we have, GN G -{G- l)N G r = M. 

We arbitrary select Nq CNs to form the first CN group. 
The kth (k > 1) group includes r • Nq CNs randomly 
chosen from the (k — l)th group and (1 — r) • Nq CNs from 
the CNs which do not belong to any of the earlier groups. 
Therefore, a CN does not necessarily belong to only one 
group anymore. As an illustration, we consider the grouping 
(r, G.Nq) = (0.5, 3, 2) on the Tanner graph of Fig. [T] again. 
Let the first group be {ci, C2}, the second one be {02,03} and 
the third one be {03,04}. In the first sub-iteration, V2 and V4 
receive {mi, 777,3, ^4, ^5} and {777,2, ^5}, respectively. V4 and 
vq receive {mi, 777,2, 777,3 , 777,5, ttiq , 777,7} and {m2, 777,4, 777,5, 777,7} 



in the second sub-iteration, in the final sub-iteration, vq will be 
updated by {mi, 7722, 777,3, 7724, 777,5, 777,7}. In short, for conven- 
tional BP, a VN can just collect information from VNs which 
are two-edge away in one iteration; for GSBP decoding, a VN 
has the opportunity to obtain the messages from four-edge- 
apart VNs; and for the proposed NDGSBP decoding algorithm, 
it is possible that a VN obtains the messages from VNs which 
are more than six-edge away if we select the overlapping ratio 
and CNs carefully. With fixed degree of parallelism Nq and 
CN number M, the larger r becomes, the longer the per- 
iteration delay is while the less the required iteration number 
becomes as a VN can update its LLR using information 
from more VNs. The product of the required iteration number 
and the per-iteration delay equals the total decoding delay to 
achieve a predetermined error rate performance. Section IV 
shows that the NDGSBP algorithm does give improved error 
rate performance for the same decoding delay. 

B. Basic definitions and notations 

A binary (N, K) LDPC code C is a linear block code 
whose M x N parity check matrix H = [H mn ] has sparse 
nonzero elements. H and thus C can be viewed as a bipartite 
graph with TV VNs corresponding to the encoded bits, and M 
CNs corresponding to the parity-check functions represented 
by the rows of H. Given the above code parameters, the 
two parameters r and t are related by £ > • Nq • r. 
More information is needed before an exact relation can 
be established. To track the statistical property variations of 
the message-passing sequence between VNs and CNs in an 
iterative decoding schedule, we also need to know the VN 
and CN degree-distribution polynomials X(x) = J2i=2 ^i x% ~ 1 
and p(x) = Y^j=2 Pj x ^~ l i where and pj denote the fraction 
of all edges connected to degree-i VNs and degree-j CNs, d v 
and d c denotes the maximum VN and CN degree. 

Let N(m) be the set of variable nodes that participate in 
check node m and M(n) be the set of check nodes that are 
connected to variable node n in the code graph. N(m)\n is 
defined as the set J\f{m) with the variable node n excluded 
while M(n)\m is the set M(n) with the check node m 
excluded. Let L n ^ m be the message sent from VN n to CN 
m and L m ^ n be the message sent from CN m to VN n. 

C. System model and decoding schedule 

Assume a codeword C = (ci, 02, cat) is BPSK- 
modulated and transmitted over an AWGN channel with noise 
variance a 2 . Let Y = (2/1, 2/2? 2/iv) be the corresponding 
received sequence and L n be the log-likelihood ratio (LLR) of 
the variable node n with the initial value given by L n = ^-ty n . 

Let Q g be the gth CN group, 1 < g < G and U be a set 
of CNs, I as the iteration counter and Imo,x as the maximum 
number of iterations. We can then describe the NDGSBP 
algorithm as follows: 

Initialization 

Set I = 1, U = {x\l <x< M}, and Q Q = for 1 < g < G. 



Step 1: Grouping check nodes 

Collect Nq elements randomly from the set U to form Qi, let 
U = U\Q\ . Collect Nq — Ng ■ r element randomly from the set 
U and Ng • r elements from Q\ to create Q2. For 3 < g < G, 
collect Ng — Ng • r element randomly from the set U and 
Ng • r elements from Q g _\\Q g _2 to create Q g and let U = 

U\Qg. 

Step 2: Message passing 

For 1 < g < G 

a) CN update: V m e Q g ,n e M(m) 

L m ^ n = 2tanh _1 I tanh Ql^J J 
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b) VN update: V n e Um>eg g N(m'),m e M(n) 

m' EA4(n)\m 

Step 3: Total LLR computation 

Vn, 1 < n < TV, 

L totai,(0 =jLn+ ^ Lm/ ^ 

Step 4: Hard decision and stopping criterion test 



(1) 



(2) 



(3) 



a) Create D w = [d[ l \d { 2 such that = if 



L^'CO > and d<° 



1 if < 0. 



b) If D^H T = or I Max is reached, stop decoding and 
output as the decoded codeword. Otherwise, set I = 
I + 1 and U = {x\l <x< M}, go to Step 1. 

III. Convergence Analysis 

As can be seen from the above description of the pro- 
posed algorithm, the messages L n ^ m and L m ^ n are real 
random variables that depend on the received channel values 
y n , the code structure and the decoding schedule. The GA 
approach assumes that they can be approximated by Gaussian 
random variables. With this approach, we need only to monitor 
the message means as the consistency condition holds in 
our case Q. We further assume that the all-zero codeword 
C = (0, 0, . . . , 0), which is mapped into the BPSK modulated 
vector X = (1,1, is transmitted. The following anal- 
ysis is based on the ideas of (8) and ifTTll with two distinct 
considerations. First, the analysis presented in [11] deals with 
vertical GSBP while we are dealing with horizontal GSBP. 
Second, the intersection among groups can be nonempty in 
our schedule. For GSBP decoding, we divide CNs into two 
types, one is updated CNs and the other is non-updated CNs. 
As depicted in Fig El To analyze the effect of nonempty 
intersections, we divide CNs into four classes in a given, say 
the gth sub-iteration of the Ith iteration. Class-a includes the 
CNs that will be updated at the g f th (g f > g) sub-iteration, 
Class-b includes the CNs which are also members of the 
previous (g — l)th group, Class-c contains the CNs which are 
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: the CNs which have been processed in previous and this sub- iterations 
: the CNs which have not be processed 

Fig. 2. A example for GSBP after two sub-iterations. 
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Fig. 3. A example for NDGSBP after three sub-iterations when r < 0.5. 



not members of the previous (g — l)th group and the Class- 
d are all CNs exclude Class-a and Class-b. Figj3] and FigH 
depict the situations after three sub-iterations for overlapping 
ratio r < 0.5 and 0.5 < r < 1 respectively. 

We now track the average values of all updated parameters 
at the Ith iteration for the proposed NDGSBP algorithm. We 
first define n c g,(i) as the mean of the message sent by a Class-x 

CN, that is, /i c <?,(o = E{L?^-ln}, where m belong to Class- 
x CNs, n is a VN connecting to m in the gth sub-iteration 
of the Ith iteration. We start with the VN update equation. 
Consider the degree-i VN n which is connected to p Class-d 
CNs, q Class-b CNs and i — p — q Class-a CNs. For the gth 
sub-iteration of the Ith iteration, we have, for g = 1, 



M0 (D ( 4 ) 

+ {i-p-q- (o 
= /i +P/jL(i) +qv r g,u) (5) 
+{i-p-q- l)/i c a-i) 
where fi (0 = fi 1,(0 and /i — E{L n } = E{^} is the mean 
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Fig. 4. A example for NDGSBP after three sub-iterations when 0.5 < r < 1. 



of the channel value. For g > 1, we obtain 

if 
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1 _ r c; 
for r < 0.5 and 



(6) 



(7) 



for 0.5 < r < 1. 

When the CNs in the g-th group are processed in l-th 
iteration, the mean of message for degree-i VNs /x <z) can be 
obtained by accumulating all possible values of /x (o with 
their corresponding coefficients uj(i,p,q): 



i—1 i—l—p 
p=0 q=0 



(0 , 



(8) 



where cj(z,p, g) is the proportion of degree-z VNs which have 
p neighboring Class-d CNs, q neighboring Class-b CNs in all 
degree-i CNs. Thus uj(i,p,q) is given by 



u(i,p, q) 



(t-^xril-xy-i-P, g = l 

(V) { l ~f p )y p z q (i -y- zy- l -v-\ g + 1 

(9) 

where x is the fraction of Class-d CNs for g = 1, y is the 
fraction of Class-d CNs and z is the fraction of Class-b CNs. 
Thus 

(10) 

y= G-\ G -\y 

r 

(12) 



~G-(G-iy 

9(1 -r) 

~G-(G-iy 

r 



G-{G-l)r' 
From Class-c CNs updating formula, we can obtain 
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Under the Gaussian approximation and for /x > 0, define 

-(t-m) 21 



If T 

*(/*) = ! 7t= / tanh(-)exp 



and (13) can be rewritten as 

- 1 i-(i-E^y 



/i o,(I) = $~ 



If we average over all CN degree j, we have 



J=2 



dr, (14) 
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Fig. 5. BER and FER performance of Mackay's (504,252) regular LDPC 
code with d c = 6 and d v = 3 using the decoding algorithms: NDGSBP, 
GSBP for G = 12 and standard BP. 



The computation of the mean of message send from a Class-b 
CN fi g ,(i) is replace /x (z) with /x /<z) in (15) where /x /(o is 
mean of message send from a previous group overlapping VN. 
And /x /(z) is got by let p at least 1 in (8) and (9) for g ^ 1. 

After Z iterations, the mean of the message passed from a 
CN /x c (z) is 

r G - Gr 

Mc( ° = G -(G-l)r^' (I) + G -(G-l)r^ G ' (,) - 

If /x c (z) — )■ cx), the connecting VNs achieve error free perfor- 
mance. 

IV. Numerical Results 

Fig. \5\ depicts the FER and BER performance of Mackay's 
(504,252) regular LDPC code with d c = 6, d v = 3 using the 
standard BP algorithm, the GSBP algorithm (G = 12) and 
the proposed NDGSBP algorithm (G = 12, overlapping ratio 
r = 0.4). On the other hand, in Fig. [6] we show the FER and 
BER performance of Mackay's (816,544) regular LDPC code 
with d c = 6 and d v = 4 using the standard BP algorithm, 
the GSBP algorithm (G = 16) and the proposed NDGSBP 
algorithm (G = 16, overlapping ratio r = 0.4). 

The simulation results reported in this section assume 
iMax = 1000 for the GSBP and BP algorithms. To have 
fair comparison, we assume the system parameter values that 
result in the same or similar computation complexity for all 
algorithms. For example, to decode the (504,252) LDPC code 
using the NDGSBP decoder with G = 12 and r = 0.4 
imply that Nq = 34 and it is allowed to have at most 
rnHG-^Na.r = dwfb « ^27 decoding iterations. 

Fig.[5]indicates that at the BER 10~ 5 , the NDGSBP decoder 
is about 0.2dB better than the standard BP decoder, and 



achieves about 0.08dB decoding gain with respect to the the 
GSBP decoder. Fig|6| also verify that the performance of the 
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The GA approach is used to track the first-order statistical 
information flow of the proposed NDGSBP algorithm. The 
GA analysis verifies that the NDGSBP decoder does give 
faster convergence performance with respect to that of the 
GSBP and BP decoders. Numerical results also demonstrate 
that, with the same decoding computation complexity, the new 
algorithm yields BER and FER performance better than that 
of the conventional BP and GSBP decoders. 

In this work, the VN order in grouping is arbitrary and the 
non-disjoint parts are randomly selected from the available 
CNs. A proper VN ordering and overlapping VN selection that 
take the code structure into account will certainly give better 
performance. The optimal decoding schedule and parameters 
(r, t) remain to be found and some analytic performance 
metrics may be needed in our search of the desired solution. 
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Fig. 6. BER and FER performance of Mackay's (816,544) regular LDPC 
code with d c = 6 and d v = 4 using the decoding algorithms: NDGSBP, 
GSBP for G = 16 and standard BP. 



TABLE I 

Number of decoding iterations required to achieve error-free 
performance for the bp, gsbp and ndgsbp (r = 0.4) decoders 
in a binary-input awgn channel. 
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NDGSBP algorithm is superior to the BP and GSBP algorithm 
for the (816,544) LDPC code. 

We use the GA approach outlined in Section |III1 to analyze 
the performance of the NDGSBP, BP and GSBP decoders. 
Given the code rate and degree distribution of LDPC codes, 
the thresholds estimated by the GA approach for BP, GSBP 
and NDGSBP decoding are the same. In Table H we list the 
number of iterations for error free performance at SNR equals 
threshold. We examine the NDGSBP performance in decoding 
two ensemble LDPC codes using the same overlapping ratio 
r = 0.4 but different group number G. The table shows 
the NDGSBP decoder consistently outperforms the other two 
decoders in convergence rate. 
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V. Conclusions 

In this paper, we propose a new group shuffled BP decoding 
scheduling method to improve the performance of LDPC 
codes. Our scheme enhances the connectivity of the code 
graph by having overlapped CNs in neighboring CN groups. 
The enhanced connectivity allow more each VN (or CN) to 
obtain related information from more VNs (or CNs) within a 
decoding iteration, accelerating the message-passing rate and 
thus the convergence speed. 



